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Abstract 

Background: Gene duplication and subsequent functional divergence especially expression divergence have been 
widely considered as main sources for evolutionary innovations. Many studies evidenced that genetic regulatory 
network evolved rapidly shortly after gene duplication, thus leading to accelerated expression divergence and 
diversification. However, little is known whether epigenetic factors have mediated the evolution of expression 
regulation since gene duplication. In this study, we conducted detailed analyses on yeast histone modification (HM), 
the major epigenetics type in this organism, as well as other available functional genomics data to address this issue. 

Results: Duplicate genes, on average, share more common HM-code patterns than random singleton pairs in their 
promoters and open reading frames (ORF). Though HM-code divergence between duplicates in both promoter and ORF 
regions increase with their sequence divergence, the HM-code in ORF region evolves slower than that in promoter 
region, probably owing to the functional constraints imposed on protein sequences. After excluding the confounding 
effect of sequence divergence (or evolutionary time), we found the evidence supporting the notion that in yeast, the 
HM-code may co-evolve with cis- and trans-regulatory factors. Moreover, we observed that deletion of some yeast 
HM-related enzymes increases the expression divergence between duplicate genes, yet the effect is lower than the case 
of transcription factor (TF) deletion or environmental stresses. 

Conclusions: Our analyses demonstrate that after gene duplication, yeast histone modification profile between 
duplicates diverged with evolutionary time, similar to genetic regulatory elements. Moreover, we found the evidence of 
the co-evolution between genetic and epigenetic elements since gene duplication, together contributing to the 
expression divergence between duplicate genes. 

Keywords: Histone modification, Histone modification code divergence, Gene duplication, Expression divergence, 
Epigenetic divergence, c/s-regulation, trans-regulation 



Background 

Although gene duplication has been widely considered 
as the main source of evolutionary novelties [1-4], the 
issue of duplicate gene preservation remains a subject of 
hot debate, i.e., how duplicate copies can escape from 
being pseudogenized and then evolve from an initial 
state of complete redundancy to a steady-stable state 
that both functionally divergent copies are maintained 
by purifying selection. As one of plausible hypotheses, 
rapid expression divergence between duplicate genes 
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may be the first step that is fundamental for the preser- 
vation of redundant duplicates [3]. There are three recog- 
nized types involved in expression regulatory mechanism: 
(i) ds-regulation of transcription mediated by promoters, 
enhancers, silencers, etc.; (ii) £ra#s-regulation mediated by 
regulatory proteins binding to cis elements, such as tran- 
scription factors (TF); and {Hi) epigenetic regulation, such 
as DNA methylation, specific histone modification pattern 
of genes (histone code hypothesis). Many studies have 
addressed the effect of cis- and trans- regulatory mechan- 
isms on the expression divergence after gene duplication, 
e.g., ds-regulatory motif, TF-binding interaction, trans- 
acting expression quantitative trait loci (eQTLs) [5-10]. 
Though there is increasing evidence that epigenetic 
changes may play important roles in the initial expression 
divergence between duplicate genes [11-15], little study 
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has been done about how regulatory network between du- 
plicate genes evolve at epigenetic level 

Epigenetic regulation on gene expression is a highly 
complex process. In a broad sense, it includes DNA 
methylation, histone modification, nucleosome occu- 
pancy, as well as microRNA [11]. Moreover, these epi- 
genetic elements can interact with each other, for 
instance, the reciprocity between DNA methylation and 
histone modification [16,17]. The complexity of epigen- 
etic regulation has made it difficult to explore its role in 
regulatory divergence between duplicates. Nevertheless, 
we have recognized budding yeast (Saccharomyces cere- 
visiae) as an ideal organism for our purpose, because its 
epigenetic regulation is relatively simple: DNA cytosine 
methylation and microRNA were not detected [18,19]. 
In other words, histone modification is the main repre- 
sentation for epigenetic modification in the budding 
yeast, less affected by other epigenetic modification 
types. Therefore, in the present study, we focus on the 
evolution of histone modification (HM) between yeast 
duplicate genes. 

Eukaryotic DNA with a unit of 146 bp wound around a 
histone octamer (two copies of each core histone H2A, 
H2B, H3, H4) is assembled into chromatin. Histone N- 
terminal tails are subject to multiple covalent post- 
translational modifications, including lysine (K) acetylation, 
lysine or arginine (R) methylation, serine (S) phosphoryl- 
ation, and so on [20,21]. Enormous possible combinations 
and interactions of histone modification types constitute 
histone code. The 'histone code' hypothesis claims that a 
specific pattern of hisone modification code can produce a 
specific effect on local chromatin structure, modulating 
DNA accessibility, and consequently regulating transcrip- 
tion and other DNA-based biological processes [20-24]. 

Our goal is to investigate the pattern of yeast histone 
modification (HM) code divergence between duplicates. 
Our hypothesis claims (i) that, when a gene is dupli- 
cated, the gene-specific histone modification profile is 
also duplicated, so on average duplicate pairs tend to 
have a higher degree of HM code similarity than ran- 
domly selected singleton gene pairs; and (ii) that since 
duplication, HM-code profile between duplicate copies 
become divergent with evolutionary time. We test these 
two predictions by conducting genome- wide analyses, as 
well as genes involved in different biological functions. 
Moreover, we are particularly interested in whether gen- 
etic regulatory elements, including ds-motifs (such as 
TATA box) and transcription factors, and epigenetic 
HM-code profile co-diverge during the evolution since 
gene duplication. To this end, time-dependent con- 
founding factors in both genetic and epigenetic factors 
need to be ruled out. Finally, the significance of our 
study for having a better understanding of regulatory 
evolution after gene duplication is discussed. 



Results 

Combinational interactions of histone-modifying enzymes 
(HATs, HDACs, HMTs, HDMs, etc.) to histone N-tail 
produce numerous types of post-translational modifica- 
tion of histones (H2A, H2B, H3, H4), such as methylation, 
acetylation [25]. Moreover, the same modification site can 
be affected by different modifying enzymes, and vice versa, 
generating different histone modification combinations, 
like H3K4me2 and H3K4ac (dimethylation and acetylation 
in Lys4 of histone H3, respectively), or H4K8ac (acetyl- 
ation in Lys8 of histone H4). In this study, histone modifi- 
cation (HM) code of a gene represents the combined 
profile of different HM sites, HM types and HM states in 
gene promoter and open reading frame (ORF) regions, 
respectively (see Methods). We believe that the HM code 
of a gene reveals the pattern of HM mediated regulatory 
network of that gene. 

Functional redundancy in histone modification (HM) 
between yeast duplicate genes 

Because of evolutionary relatedness, duplicate pairs may 
have a higher similarity of histone modification (HM) 
code than two randomly selected single-copy genes (sin- 
gletons). To test this hypothesis, we compared the dis- 
tance of HM code between duplicate pair and randomized 
singleton pair. To be simple we choose one minus Pear- 
sons product- moment correlation coefficient, i.e., D HM = 
1-r, to define the distance of HM code. The larger the 
value of D HM , the higher divergence of histone modifica- 
tion code between duplicate genes. Specifically, denote the 
distance of HM code associated in promoter and ORF 
regions by D HM _ P and D HM _ Q) respectively. Randomized 
pairs were selected from single copy genes with 10000 
repeats. As expected, we observed that both D HM _ P and 
Dhm-o measures show a lower divergence degree of his- 
tone modification code between duplicate genes than that 
of randomized singleton pairs (Wilcoxon rank sum test: 
P<10" 15 for both cases; Figure 1A). 

Generally speaking, local chromatin environment around 
genes is one of important components leading to different 
histone modification code of each gene. While chromatin 
environment differs in different chromosomes, duplicate 
genes locating in different chromosomes may be under dif- 
ferent chromatin environment, possessing the chromosome- 
specific HM profile. Following this argument, we classified 
all yeast gene pairs (duplicate and randomized singleton 
pairs) under study into two groups: they are located in the 
same chromosome, or different chromosomes, and tested 
the relationship between histone modification pattern and 
chromosome condition. We observed that though the HM 
profile divergence is not significantly correlated with loca- 
tion distance of gene pairs in the same chromosome (Pear- 
sons product-moment correlation: r = 0.06, P = 0.09 for 
promoter region and r = 0.02, P = 0.45 for ORF region), 



Zou et al. BMC Evolutionary Biology 201 2, 12:1 1 1 
http://www.biomedcentral.eom/1471-2148/12/1 1 1 



Page 3 of 12 




Promoter ORF Promoter ORF 

Figure 1 Comparison of the histone modification divergence between yeast genes. Comparison of the histone modification divergence 
between (A) duplicate genes and randomized pairs of single-copy genes, and (B) all pairs of genes in the same and different chromosomes. 
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gene pairs locating in the same chromosome share more 
common HM code than that in different chromosomes 
(Wilcoxon rank sum test: P = 0.07 and P = 0.01 for pro- 
moter and ORF regions, respectively; Figure IB). 

The chromosome effect on the HM-code divergence 
between duplicate genes may suggest an alternative inter- 
pretation about a higher similarity of HM distance between 
duplicate genes than random pairs. That is, duplicate pairs 
tend to be located in the same chromosome because of tan- 
dem gene duplications, compared to randomized singleton 
pairs (Chi-squared test: 17.2, d,f. = l, P< 10" 4 ). To rule 
out this possibility, we chose duplicate pairs and rando- 
mized singleton pairs where both copies are belonging to 
different chromosomes, and observed the similar result to 
Figure 1A (see Additional file 1: Figure SI). Hence, we con- 
clude that duplicate genes represent their functional redun- 
dancy at the level of histone modification mediated 
regulatory network. 

The evolution of HM is coupled with coding sequence 
divergence after gene duplication 

We further expect that functional redundancy in histone 
modification code of duplicate genes as shown in 
Figure 1A would maintain a high degree in recently dupli- 
cated genes, and low in ancient duplicates. To verify this 
claim, we investigated the relationship between the 
distance of HM code (D HM . P or D HM , 0 ) and coding 
sequence divergence (the synonymous distance K s or the 
nonsynonymous distance K A between duplicate genes). 
Considering the statistically unreliable estimation of syn- 
onymous or nonsynonymous substitution distance when 
Ks or K A becomes larger because of repeated substitutions 
at the same site, we selected duplicate pairs with K s < 2.0 
and /<"^<0.5 for this analysis. We observed that both 
Dhm-p and D HM . Q are positively correlated with K s or K A 
(Pearsons product-moment correlation: all r>0.45, 
P < 10" 15 for all data points; Figure 2), suggesting that the 



divergence of HM code is coupled with the coding se- 
quence divergence between duplicate genes. To avoid 
correlated data points bringing the bias, we selected 
independent pairs of duplicate genes using the method 
from Zou et al. [9] and reanalyzed. The similar result 
remains hold (all r > 0.40, P < 10" 13 for all data points; 
in Additional file 1: Figure S2). 

Considering K s or K A as a proxy to evolutionary time 
since gene duplication, we suggest that the correlation 
between the HM-code divergence and the coding 
sequence divergence has been mainly driven by muta- 
tions accumulated with evolutionary time. Our interpret- 
ation is based on two reasons: First, we observed a weak 
negative correlation between the HM divergence and the 
K A IKs ratio of duplicate genes (Pearsons product- 
moment correlation: r = -0.12, P<10' 7 for D HM , P and 
r=-0.05, P = 0.04 for D HM _ Q ). As the K A IK S ratio is an 
indicator of sequence conservation in coding region, our 
finding implies that duplicate genes with stringent func- 
tional constraints on coding sequence may have greater 
divergence in the HM code, but the effect is marginal. Sec- 
ond, promoter HM code of duplicate genes diverges much 
quicker than that of ORF HM code (Wilcoxon rank sum 
test: P<10~ n ; Fi gure 2), while significant but weak differ- 
ence of Dhm-p and D HM _ Q in randomized singleton pairs 
(Wilcoxon rank sum test: P = 0.02; Figure 1). Some factors 
may be involved to accelerate the divergence of promoter 
HM code between duplicate genes, such as the evolution of 
transcription factors (TF) shared by duplicate genes. 

Co-evolution of the HM-code divergence between 
duplicates with several genetic regulatory elements 

The interaction between epigenetic and genetic elements 
in gene regulation has been increasingly acknowledged 
[21,26], raising an interesting question whether the diver- 
gence of histone modification code between duplicate 
genes co-evolves with some trans-acting factors binding 
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to those duplicate genes. We first studied the relationship 
between the distance of promoter HM code (D HM _ P ) and 
the distance of transcription factors (D TF ) and £raws-acting 
expression quantitative trait loci (eQTLs) (D t , e Q TL ) be- 
tween duplicate genes. Traws-acting eQTLs of one gene 
represent all £raws-regulatory proteins for its transcription, 
not limited to transcription factors. Two distance mea- 
sures D TF and D t _ e Q TL were determined by Czekanowski- 
Dice formula (Methods). We found that they are signifi- 
cantly correlated (Peasrons product-moment correlation: 
r>0.25, P<10' 13 ; Figure 3A, 3C). Similar results were 
obtained in the case of ORF HM code (Figure 3B, 3D). 

As both Dhm-p and D HM _ Q , as well as D t _ e Q TL and D TB 
increase with evolutionary time (K s or K A as the proxy) 
[Figure 2; 9], it is reasonable to suspect that K s or K A may 
underlie these statistically significant correlations between 
the HM code and genetic regulatory elements. We con- 
ducted the partial correlation in D HM _ P -D TF and D HM _ P - 
Dt-eQTL of duplicate genes under the controlling of K s and 
K A variables (with the restriction of K s < 2.0 and K A 
< 0.5), and still observed the significant relationship 
(r = 0.18, P< 0.001 for D HM „ P -D TF and r=0.15, P<0.05 
for D HM _ P -D t _ e Q TL ), though they are relatively weak. In 
short, our analysis provides the evidence that the histone 
modification code and £raws-regulators shared by dupli- 
cate genes may have co-evolved since gene duplication. 

Moreover, we design the following analysis to further 
explore the co-evolution between the HM code and trans- 
regulators (TF or trans -acting eQTL), by dividing yeast 
genes into two categories, trans -targeted genes and con- 
trolling genes. Trans -targeted genes are genes that are tar- 
geted by transcription factors (TF- targeted genes) or have 
at least one trans -acting eQTL (trans-eQTL acting genes) 
and the rest of genes are controlling genes (Methods). We 
totally obtained 4495 trans -targeted genes and 2226 con- 
trolling genes (Figure 4A). Interestingly, both promoter 
and ORF HM code distances of duplicates in the group of 
trans -targeted genes are, on average, significantly higher 



than those in the group of controlling genes (Figure 4B) 
(Wilcoxon rank sum test: promoter, P = 0.003 and ORF, 
P = 0.0001). It should be noticed that the pattern we 
observed in Figure 4B would not be affected by the strong 
correlation between the HM divergence and evolutionary 
time (K s as the proxy) of duplicate genes, because the dis- 
tribution of K s has been found no significant difference 
between trans -targeted genes and controlling genes (Wil- 
coxon rank sum test: P = 0.08). 

The HM-code divergence and TATA-box regulation 

TATA box is the core promoter element for gene regula- 
tion responding to environmental stresses [27,28]. To 
examine the role of TATA box in the HM-code divergence 
between duplicate genes, we divided all yeast duplicate 
pairs into three groups, TATA-containing (both have 
TATA-box), TATA-less (both do not have TATA-box), 
and TATA_diverge (only one copy has TATA-box). We 
compared the HM-code distance in promoter (D HM _ P ) 
and ORF (D HM _ 0 ) region between duplicate genes in these 
groups. Interestingly, both D HM _ P and D HM _ Q show the 
highest degree in the TATA-diverge group, and the lowest 
in the TATA-containing group (Figure 5) (Wilcoxon rank 
sum test: promoter, P< 10" 10 ; ORF, P< 10" 15 ). 

Our explanation is as follows. Note that TATA- 
containing genes may interact with some specific chro- 
matin modification factors to regulate gene expression 
[29]. In the TATA_diverge group, only one duplicate 
with TATA-box has such interaction, resulting in a 
higher HM-code divergence between them. By contrast, 
in the case of TATA-containing group, both duplicates 
with TATA-box have similar interactions, resulting in a 
higher HM-code similarity between them. 

Biological functions and the HM-code divergence 
between duplicate genes 

Do different biological functions affect the level of the 
HM-code divergence between duplicate genes? We used 
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Figure 3 Relationship between the distance of histone modification pattern and frans-regulators shared by duplicate genes. (A) and (B) 

show the correlation between the distance of transcription factors shared by duplicate genes and the divergence of promoter and ORF histone 
modification pattern between duplicate genes, respectively, while (C) and (D) represent the interrelationship between the distance of trans-acting 
eQTLs targeted to the duplicate genes and the distance of promoter and ORF histone modification pattern between duplicate genes, 
respectively. 



GO (Gene Ontology) Slim (biological process with 37 cat- 
egories; see Methods) to address the issue. We analyzed 
Dhm-p an d D HM _ Q of duplicate genes in each GO category 
and compared D HM _ P and D HM _ Q among these GO groups 
by one-way analysis of variance (ANOVA). Our finding is 
that duplicate pairs with different biological processes dif- 
fer significantly in D HM _ P and D HM _ Q (P < 10" 13 for D HM _ P , 
P < 10~ 15 for D HM _o, Table 1). Use D HM _ P for an example, 
duplicate genes involved in cofactor metabolic process, 
cellular amino acid and derivative metabolic process, cell 
cycle and cytoskeleton organization may have greater 
HM-code divergence, i.e., a higher D HM _ P , while duplicate 
genes involved in translation, DNA metabolic process, 



pseudohyphal growth and transcription may have lower 
divergence of the promoter HM code. Moreover, we con- 
ducted a similar analysis on 24 cellular components (GO 
Slim classification, see Methods), and observed that dupli- 
cate genes in different subcellular localization also 
undergo the different evolutionary rate of histone modifi- 
cation code after gene duplication (Table 2). It is possible 
that duplicate genes in some biological processes or sub- 
cellular localization are young (measured by small K s ) 
while others are old (with high K s ). Since D HM _ P and D HM _ 
o are positively correlated with K s (Figure 2), we have to 
remove the confounding effect caused by K s . We used the 
analysis of covariance (ANCOVA) {D HM _ P or D HM _ Q ~ 



Zou et al. BMC Evolutionary Biology 201 2, 12:1 1 1 
http://www.biomedcentral.eom/1471-2148/12/1 1 1 



Page 6 of 12 




Figure 4 Different evolutionary rate of the histone modification code for duplicate genes in trans-targeted genes and controlling 
genes. (A) The information of different category genes, trans-targeted genes (JF-targeted genes and trans-eQTL acting genes) and controlling 
genes; (B) Comparison of the promoter histone modification distance {D HM . P ) and the ORF histone modification distance {D HM _ Q ) of duplicate 
genes between trans-targeted genes and controlling genes. Here, trans-targeted genes are the union of TF-targeted genes and trans-eQTL acting 
genes. 

V J 



K s + T (biological process or subcellular localization) + K s : 
T) and the result remains statistically highly significant 
(Table 1, 2), suggesting K s is not a confounding factor that 
may affect our analyses. 

The expression divergence under genetic, epigenetic and 
stressful perturbations 

It is well-documented that gene expression divergence 
of duplicate genes increases with evolutionary time, but 
the underlying mechanism remains a subject of debate 
[30]. The analysis we describe below is to know whether 
the divergence of HM-mediated regulatory network 
affects the expression divergence between duplicate 
genes. We compared the interrelation between the his- 
tone modification pattern distance (D HM _ P , D HM _ Q ) and 
the expression distance (E) between duplicate genes, and 
found that they are significantly correlated (Pearsons 
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Figure 5 The evolution of promoter or ORF histone 
modification pattern of duplicate genes which have different 
status of TATA box. 



product-moment correlation; D HM _ P ~E\ r = 0.24, P < 10" 
15 ; D HM _ 0 'E: r = 0.30, P< 10 15 ; Figure 6). 

We further divided yeast expression profiles into four 
types from different perturbation conditions: 1) normal 
developmental or physiological conditions, defined as 
'Normal' treatment; 2) a set of conditions where expres- 
sion changes attribute to environmental stresses, denoted 
as 'Stress' treatment; 3) conditions where single gene cod- 
ing chromatin modifiers (CM) like SWI/SNF, HDACs, 
HATs, etc. was deleted, denoted as < CM_del' treatment; 4) 
conditions where single gene coding transcription factors 
(TF) was deleted, denoted as 'TF_del' treatment. The lat- 
ter two types are able to test the effect of chromatin modi- 
fication related proteins and transcription factors on other 
genes, respectively. We then calculated the expression 
divergence between duplicate genes under these four types 
of conditions, denoted by E Normal , E Stress , E CM deh E T¥ deh 
respectively. We observed that the expression distance 
between duplicate genes in < CM_del' condition {E CM de i) 
is significantly greater than that in 'Normal' condition 
(^Normal) (Wilcoxon rank sum test: P < 10" 10 ; Figure 7A), 
but much lower than that in 'Stress' and 'TF_del' condi- 
tions (^stress and £xF_dei) (Wilcoxon sum rank test, P < 10" 
15 ; Figure 7A). Results imply that histone modification 
related enzymes and gene associated histone modification 
profile may indeed influence the expression evolution of 
duplicate genes, though the relative contribution to the 
expression divergence between duplicate genes is highly 
lower than genetic related factors like transcription fac- 
tors, even externally environmental stresses. 

TATA-containing genes are usually enriched in stress- 
related genes [29], and represent high expression variability 
[31,32]. In our study, we detected that in four disturbed 
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Table 1 The divergence of histone modification code associated with promoter or ORF region of duplicate genes 
involving in 37 biological processes 



Biological Process Dhm-p D hm . 0 





Mean 


S.E. a 


Mean 


S.E. 


Cofactor metabolic process 


1.089857 


0.099487 


0.94615 


0.079585 


Cellular amino acid and derivative metabolic process 


1.056124 


0.075125 


0.735075 


0.058636 


Cell cycle 


1 .022464 


0.070499 


0.628888 


0.045862 


Cytoskeleton organization 


1.011731 


0.100702 


0.732692 


0.068568 


Carbohydrate metabolic process 


0.957513 


0.041359 


0.801785 


0.03478 


Cell wall organization 


0.943616 


0.078784 


0.919398 


0.069325 


Membrane organization 


0.943309 


0.067702 


0.786555 


0.069908 


Lipid metabolic process 


0.93571 


0.079815 


0.731593 


0.059279 


Generation of precursor metabolites and energy 


0.928247 


0.096518 


0.882262 


0.072135 


Anatomical structure morphogenesis 


0.921051 


0.079036 


0.777731 


0.073299 


Protein modification process 


0.900252 


0.038873 


0.854947 


0.030103 


Protein folding 


0.895056 


0.060764 


0.677196 


0.05632 


Response to stress 


0.891941 


0.051014 


0.766131 


0.04367 


Protein catabolic process 


0.877025 


0.072138 


0.74283 


0.065929 


Vesicle-mediated transport 


0.867052 


0.045818 


0.763747 


0.04406 


Cytokinesis 


0.855475 


0.128721 


0.458555 


0.075571 


Vitamin metabolic process 


0.844587 


0.123271 


0.697339 


0.095067 


Transport 


0.819455 


0.026288 


0.726211 


0.021028 


Heterocycle metabolic process 


0.812252 


0.080271 


0.737343 


0.070763 


Meiosis 


0.803064 


0.204095 


0.586892 


0.16534 


Cell budding 


0.798157 


0.088983 


0.814543 


0.12039 


Signal transduction 


0.796985 


0.069456 


0.846847 


0.05721 


RNA metabolic process 


0.792216 


0.043033 


0.562501 


0.032646 


Cellular aromatic compound metabolic process 


0.783433 


0.116406 


0.685336 


0.092675 


Ribosome biogenesis 


0.745581 


0.078664 


0.489313 


0.057953 


Cellular respiration 


0.741278 


0.175087 


0.970903 


0.130576 


Conjugation 


0.741105 


0.115964 


0.561448 


0.065045 


Cellular homeostasis 


0.732899 


0.107582 


0.622864 


0.080052 


Sporulation resulting in formation of a cellular spore 


0.726966 


0.119265 


0.430464 


0.083954 


Organelle organization 


0.709827 


0.043332 


0.569849 


0.029456 


Response to chemical stimulus 


0.675718 


0.055068 


0.590837 


0.045851 


Nucleus organization 


0.668734 


0.212327 


0.396937 


0.134765 


Transcription 


0.586723 


0.061277 


0.490756 


0.047904 


Pseudohyphal growth 


0.523636 


0.183077 


0.741094 


0.154572 


DNA metabolic process 


0.520462 


0.057569 


0.403555 


0.03735 


Translation 


0.475206 


0.058331 


0.32502 


0.050416 


Other 


0.803085 


0.04149 


0.791027 


0.036751 


ANOVA test 


F=3.91, P< 10" 13 




F=7.60,P<10" 15 




ANCOVA test 


F=2.24,P<10" 4 




F=4.78,P<10" 15 




a : S.E. denotes the standard error. 
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Table 2 The divergence of histone modification code associated with promoter or ORF region of duplicate genes 
involving in 24 cellular components 

Cellular Component D HM _ P D HM . 0 





Mean 


S.E. 


Mean 


S.E. 


Cytoplasm 


0.815386 


0.022075 


0.691933 


0.018111 


Mitochondrion 


0.804229 


0.045602 


0.775065 


0.037205 


Ribosome 


0.401566 


0.051071 


0.21871 


0.044488 


Nucleus 


0.757928 


0.03278 


0.633467 


0.025154 


Membrane 


0.867299 


0.045905 


0.719912 


0.037068 


Mitochondrial envelope 


0.850156 


0.071138 


0.845782 


0.058631 


Nucleolus 


1.041006 


0.088386 


0.885202 


0.076221 


Chromosome 


0.848459 


0.121672 


0.500084 


0.069019 


Membrane fraction 


0.728517 


0.057793 


0.49282 


0.052274 


Cell wall 


0.703015 


0.05532 


0.397484 


0.060315 


Vacuole 


0.560115 


0.090545 


0.725808 


0.070671 


Golgi apparatus 


1 .020564 


0.07273 


0.856226 


0.060742 


Cytoplasmic membrane-bounded vesicle 


0.780526 


0.149402 


0.703323 


0.11963 


Endomembrane system 


0.77575 


0.132578 


0.712372 


0.098021 


Cellular bud 


0.877352 


0.094858 


0.65631 


0.069712 


Cytoskeleton 


0.883771 


0.094073 


0.589399 


0.07159 


Microtubule organizing center 


0.664255 


0.193299 


0.553038 


0.177284 


Site of polarized growth 


0.875764 


0.0889 


0.636855 


0.064332 


Endoplasmic reticulum 


0.877175 


0.081264 


0.673596 


0.056244 


Plasma membrane 


0.72374 


0.034621 


0.70187 


0.027569 


Peroxisome 


0.500936 


0.245856 


0.827018 


0.120535 


Extracellular region 


0.553479 


0.167985 


0.224751 


0.129525 


Cell cortex 


1 .00632 


0.081755 


0.650496 


0.076325 


Other 


0.752484 


0.047213 


0.632212 


0.038696 


ANOVA test 


F=3.54,P<10" 7 




F = 6.53, P< 10~ 15 




ANCOVA test 


F= 1.77, P = 0.01 




F = 4.43, P< 10~ 10 





conditions (Normal, Stress, CM-del, TF_del), the expression 
divergence in TATA-diverge and TATA-containing groups 
are significantly larger than that in TATA-less duplicate 
genes (Figure 7B). The discrepancies between expression 
change and the histone modification divergence in these 
duplicate gene types (TATA-containing, TATA-less, TATA- 
diverge) are observed. 

Discussion 

The divergence of HM profile between duplicate genes 

Our detailed analyses on yeast whole-genome histone 
modification (HM) code profile have shown that duplicate 
genes share more common HM-code patterns than rando- 
mized singleton pairs in their promoter and ORF regions, 
and the HM-code divergence between duplicates in both 
regions increase with the sequence divergence. This finding 
supports the notation that epigenetic divergence between 



duplicate genes may have been driven by the accumulation 
of mutations with the duplication time in both their pro- 
moter and ORF regions, because it has been shown that 
the divergence of coding sequence such as K s between 
yeast duplicates is a proxy to evolutionary time. In other 
words, no external driving force is needed to explain the 
HM profile divergence between duplicates, though it 
remains possible of neofunctionalization based upon di- 
vergent HM profile through some positive selection 
mechanisms. 



Hypothesis-based genomic correlation analysis 

Genome-wide functional analysis of duplicate genes is in 
attempt to reveal functional correlation between genetic 
and epigenetic factors during the process of functional 
innovation through gene duplication. However, the uni- 
versal confounding effect of the sequence divergence 
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Figure 6 Correlation between the distance of histone modification pattern and the expression divergence (E) of duplicate genes. 

Relationship between the expression distance and (A) the promoter histone modification divergence {D HM . P ) and (B) the ORF histone 
modification divergence (D HM . 0 ). 



(K s ) has complicated the practical analyses. Conse- 
quently, the controversy about the cause-effect interpret- 
ation has been inevitable, because genome-wide analysis 
of duplicate genes has been viewed as an exploration ra- 
ther than a hypothesis -testing approach. 

Since the correlation between the HM profile diver- 
gence and the sequence divergence actually reflects the 
fundamental evolutionary process driven by the accumu- 
lation of mutations, we view this as a null hypothesis in 
the genome-wide functional analysis of duplicate genes. 
That is, any meaningful inference about the functional 
correlation within or between genetic and epigenetic ele- 
ments needs to reject this null hypothesis, as we have 
shown in this study. 



Interaction between genetic and epigenetic elements: 
who is the driver? 

The interaction between epigenetic and genetic elements 
in gene regulation has been increasingly acknowledged. 
For instance, the establishment of the histone modifica- 
tion code may partially involve the recruitment of specific 
histone modifying enzymes such as HATs, HDACs, HMTs 
by transcription factors (TF) [26]. Meanwhile, the dis- 
tinctly combinatory histone modification code associated 
with gene may also provide specific binding code that is 
read by other transcription factors [21], These observa- 
tions raise an interesting question whether the divergence 
of histone modification code between duplicate genes may 
co-evolve with £raws-acting factors binding to those 
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Figure 7 Comparison of the expression divergence between duplicate genes under different disturbing conditions. (A) The expression 
divergence between all duplicate genes; (B) The expression divergence between duplicate genes with different status of TATA box. P-value was 
obtained from Wilcoxon rank sum test using the open-source R statistical analysis language. 
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duplicate genes, e.g., transcription factors, histone modify- 
ing enzymes. 

We have observed a higher divergence for HM code of 
duplicate genes in the category of trans -targeted genes 
than that of controlling genes, suggesting that the change 
of £ra/?s-regulators binding to duplicate genes may affect 
the pattern of HM code in both promoter and ORF 
regions, and thus accelerating the divergence of histone 
modification code between duplicate genes. Yet, it remains 
unclear about the cause-effect relationship. For instance, 
does the divergence of trans-regulators to duplicate genes 
facilitate the divergence of histone modification between 
duplicate genes, or vice versa! Our further study will 
address this issue. 

Functional preference in the HM code divergence 
between duplicate genes 

We observed the functional bias on the HM-code diver- 
gence after gene duplication. One possibility is that histone 
proteins associated with genes in different biological func- 
tions may be subject to differentially post-translational 
modification, leading to different divergence rate of histone 
modification code between duplicate genes. Since histone 
modification process of one gene largely depends on its 
chromatin environment and DNA sequence interacted by 
histone modifying enzymes [33], functionally selective con- 
straints may also be imposed on histone modification evo- 
lution associated with that gene, a situation similar to 
DNA sequence evolution. 

Conclusions 

In this study, we unveiled the evolution of yeast histone 
modification code since gene duplication. Though dupli- 
cate genes represent functional redundancy at histone 
modification level compared with single-copy genes, the 
histone modification divergence occurred along with evo- 
lutionary time {K s as the proxy), which possibly due to the 
coding sequence evolution after gene duplication. More- 
over, the histone modification code in ORF region evolves 
slower than that in promoter region, indicative of func- 
tionally selective constraints on protein sequences. Going 
further, after controlling the confounding effect of the 
coding sequence divergence (Ks), the histone modification 
code co-evolves with cis- (TATA box) and trans- (TF and 
trans-acting eQTL) regulatory factors, confirmed by the 
fact that the histone modification code is shaped by the 
combined interaction among histone-modifying enzymes, 
trans-acting elements and ds-regulatory motif. In 
addition, histone modification makes contribution to the 
expression divergence between duplicate genes, despite 
the minor effect compared to transcription factors and en- 
vironmental stresses. Taking together, we provided the evi- 
dence of the co-evolution between genetic and epigenetic 



elements since gene duplication, together contributing to 
the expression divergence between duplicate genes. 

Methods 

Data of yeast histone modification pattern 

Genome-wide histone modification pattern data of Sac- 
charomyces cerevisiae were downloaded from Chroma- 
tinDB [34] (http://www.bioinformatics2.wsu.edu/cgi-bin/ 
ChromatinDB/cgi/visualize_select.pl). ChromatinDB pro- 
vides the user with easy access to ChlP-microarray data for 
a large set of histones or histone modifications in S. cerevi- 
siae, which includes 17 distinct histone modification com- 
binations like dimethylation in Lys4 of histone H3 
(H3K4me2), acetylation in Lysl2 of histone H4 (H4K12ac), 
etc. and 5 histone protein occupancy levels (H2A, H2B, 
H3, H4, H2A.Z). We applied log base-2 of average enrich- 
ment ratio with nucleosome-normalizing for each of 22 
histone modification data in promoter and open reading 
frame (ORF) regions of genes. 

Yeast microarray expression data 

A total of 84 microarray expression data points of S. cerevi- 
siae whose expression changes are attributing to internal 
disturbing like developmental or physiological conditions 
were respectively collected [35-37]. This type was defined 
as "Normal" conditions, which are not genetically per- 
turbed by regulatory network related elements like tran- 
scription factors and chromatin modifiers (CM) or other 
environmental stresses. We collected 170 gene expression 
profiles of yeast strains mutated for various chromatin 
modifiers from 26 publications [38]. We called this type of 
expression profile data as 'CM_def conditions. Expression 
profile data of 263 transcription factor-deletion experi- 
ments were obtained from the Gene Expression Omnibus 
(GEO) database under the series accession number 
GSE4654 [39]. Similarly, this type was denoted as 'TF_der 
experiments. A total of 504 cDNA microarray data points 
of yeast whose expression changes are attributing to envir- 
onmental stresses were collected [9]. We call this type as 
'Stress' conditions. Normalization was done as each original 
paper recommended. 

Data of trans- and cis- regulatory elements 

Transcription factor (TF)-DNA binding profiles of yeast 
were downloaded from Lee et al. [40] and Harbison 
et al.[41]. In the study of Harbison et al. (2004), we just 
used 203 DNA-binding transcription factors in rich 
media conditions, regardless of 84 regulators in environ- 
mental stressed conditions. Most transcription factors in 
two studies are overlapped. Finally, we obtained 207 TF- 
all S. cerevisiae genes binding interaction profiles. For 
each gene, a p-value was assigned to measure the prob- 
ability of true TF-target interaction; a smaller p-value 
means the interaction is more likely. Here, we used 
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relatively stringent significance level of 0.001 as cutoff to 
define the status of TF-target gene interaction. Two 
studies together determine all TF-target gene interac- 
tions, and we observed that 3183 genes are binding by at 
least one transcription factor (TF-targeted genes). Yeast 
genomic expression quantitative trait loci (eQTLs) data 
were downloaded from Brem and Kruglyak [42]. 
Wilcoxon-Mann- Whitney (WMW) test with the criter- 
ion of 50 kb interval was conducted to detect and define 
eQTL regions [9]. Finally, we obtained 2775 genes which 
at least have one trans-acting eQTL (trans-eQTL acting 
genes). Thus, we can divide all yeast genes into two cat- 
egories, trans- targeted genes and controlling genes, 
where trans -targeted genes are the union of TF-targeted 
genes and trans-eQTL acting genes, while controlling 
genes are the reminders. Since trans-eQTL acting genes 
were not only regulated by transcription factors, but 
most by chromatin related enzymes and other factors 
[43], trans-targeted genes may have the possibility to be 
regulated by all trans-regulators, not restricted to tran- 
scription factors. 

There are two types of genes, TATA-containing genes 
and TATA-less genes [29]. We classified all duplicate pairs 
into three categories, TATA-containing, TATA-less and 
TATA-diverge pairs. TATA-containing and TATA-less 
types are those duplicate pairs where both copies have or 
don't have TATA box, respectively, while TATA-diverge 
type refers to duplicate pairs where one copy belongs to 
TATA-containing genes, and the other TATA-less genes. 

Protein subcellular localization and biological process 

The information of protein localization and biological 
process for S. cerevisiae was defined by the Gene Ontology 
(GO) classification, and downloaded from Saccharomyces 
Genome Database. GO Slim was used to classify 24 cellu- 
lar component sorts and 37 biological process categories. 
A duplicate gene pair was assigned to a GO Slim term if 
both duplicate copies are belonging to this GO term, or 
one copy is belonging to while the other is not annotated. 

Defining functional divergence between yeast 
duplicate genes 

There are two types of histone modification profiles, 
histone modification pattern associated with gene pro- 
moter region and open reading frame (ORF) region. One 
minus Pearsons product-moment correlation coefficient 
(1-r) was used to determine the distance of these two types 
of histone modification pattern between duplicate genes, 
shortly denoted as D HM _ P for promoter histone modifica- 
tion distance and D HM _ Q for ORF histone modification 
distance. 

We modified the Czekanowski-Dice formula [44] to cal- 
culate the distance of transcription factors or trans-acting 
eQTLs shared by duplicate gene 1 and 2 in a duplicate 



pair, shortly denoted as D TF and D^ eQTLf respectively. Sup- 
pose A i2 be the number of TFs or trans -acting eQTLs that 
differ between one duplicate pair; yjUy 2 be the number of 
TFs or trans-acting eQTLs that regulate at least one of 
duplicate genes, and yf\y 2 be the number of shared TFs 
or trans-acting eQTLs between a duplicate pair. Then, the 
TF distance or trans-acting eQTL distance between dupli- 
cate genes 1 and 2 is defined as follows: 

£>tf(1,2) or A- eQ 7x(l,2) 

= A 12 /\yi Uy 2 +yi ny 2 ] (1) 

Apparently, the greater the value, the higher degree of 
TF or trans-acting eQTL divergence between duplicate 
genes. 

We used evolutionary distance (E) defined by Gu et al. 
[5] as the measure of expression divergence between du- 
plicate genes of four types, shortly denoted as E Normah 
£cM_dei> ^TF_dei an d ^stress f° r 'Normal! 'CM_der, 'TF_der 
and 'Stress' experiments, respectively. Specifically, for 
any duplicate gene 1 and 2, let X\^ and x 2 k be its expres- 
sion level, respectively, in the /<th microarray experiment; 
X\ and X2 be the mean of expression level in /<th micro- 
array experiments, respectively, where k = l,...m. The 
formula of the expression distance (E) between gene 1 
and 2 is as follows: 

m 

En = ^2 ^ Xlk ~ ~ ( X2k ~ x i)] 2 /( m ~ x ) ( 2 ) 

k=l 



Determination of yeast duplicate pairs 

The method of Gu et al. [45] was applied to identify du- 
plicate genes. As the criterion of 80% alignable regions 
between protein sequences is too stringent, and may 
miss some duplicate genes, we reduced this criterion to 
50%. All pairs of duplicate genes in each gene family 
were used for the analysis. The reminders of S. cerevisiae 
genes were considered as singleton genes. The rate of 
synonymous substitutions (K s ) and nonsynonymous sub- 
stitutions (K A ) between duplicate genes were estimated 
using PAML [46] with default parameters. 

Additional file 



Additional file 1: Figures S1-S2. are available at online web site of 
BMC Evolutionary Biology journal. 
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methyltransferases; HDMs: Histone demethylases; K s : The rate of synonymous 
substitutions; K A : The rate of nonsynonymous substitutions; TF: Transcription 
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